-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try building rustc with N codegen units #81214
Conversation
PR CI x86_64-gnu-llvm-9 took 77m (usually it takes ~40m). @bors try @rust-timer queue |
Awaiting bors try build completion. |
⌛ Trying commit ed6643a390a37d29cb0e8cf323bf3c46cf3a8c9e with merge ad1e2edcc154dde638611f902f9e504f5a8d9c6f... |
☀️ Try build successful - checks-actions |
Queued ad1e2edcc154dde638611f902f9e504f5a8d9c6f with parent a4cbb44, future comparison URL. @rustbot label: +S-waiting-on-perf |
Finished benchmarking try commit (ad1e2edcc154dde638611f902f9e504f5a8d9c6f): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
The size of unpacked toolchain was also reduced from 404M to 354M (as installed by rustup-toolchain-install-master with default options). Now with two codegen units (PR CI x86_64-gnu-llvm-9 took 63m). @bors try @rust-timer queue |
Awaiting bors try build completion. |
⌛ Trying commit 4f9f69512c7c3ff4fdfa298c28f2d339015f0b2d with merge 4c4da1d578ee027b619ce11c645e6de512d26927... |
☀️ Try build successful - checks-actions |
Queued 4c4da1d578ee027b619ce11c645e6de512d26927 with parent 4d0dd02, future comparison URL. @rustbot label: +S-waiting-on-perf |
Finished benchmarking try commit (4c4da1d578ee027b619ce11c645e6de512d26927): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Huge improvements of up to 15.5%. |
With three codegen units (PR CI x86_64-gnu-llvm-9 took 44m). @bors try @rust-timer queue |
Awaiting bors try build completion. |
⌛ Trying commit 7848a41345dcccb14b68adaee9526ebd2117c6be with merge 01e13b863b2b389489961bf7431b7a315264a43e... |
☀️ Try build successful - checks-actions |
Queued 01e13b863b2b389489961bf7431b7a315264a43e with parent 85e355e, future comparison URL. @rustbot label: +S-waiting-on-perf |
Finished benchmarking try commit (01e13b863b2b389489961bf7431b7a315264a43e): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
With rust.codegen-units=4 (PR CI x86_64-gnu-llvm-9 took 44 min): @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 935f7819e5068b1d43cd9dbe904eb626d2961987 with merge bb215a40be948264811fbac2da595208a74e625c... |
☀️ Try build successful - checks-actions |
Queued bb215a40be948264811fbac2da595208a74e625c with parent 2918062, future comparison URL. |
Finished benchmarking try commit (bb215a40be948264811fbac2da595208a74e625c): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
With rust.codegen-units=5 (PR CI x86_64-gnu-llvm-9 took 42 min): @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit f8b67ff with merge 03192e87b7e4444897b09bec2729882ed234cbdd... |
☀️ Try build successful - checks-actions |
Queued 03192e87b7e4444897b09bec2729882ed234cbdd with parent e9920ef, future comparison URL. |
Finished benchmarking try commit (03192e87b7e4444897b09bec2729882ed234cbdd): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
The table updated with new measurements is in #81214 (comment) @Mark-Simulacrum what would be next steps? Is that sufficient to make a decision, or would you like me to repeat try builds once again? |
I am curious what the effect of only building libstd with a single codegen unit would be. |
The standard library is already built with a single codegen unit. |
I might be misunderstanding the table, but by my reading it looks like switching to bootstrapping with codegen units affecting only rustc has actually introduced some pretty major regressions - maybe something has changed on master in the meantime? I guess I am not feeling like we have a complete picture of the constraints here and the effects of various switches. Bootstrap columns in the table seem pretty useless to me -- they're not indicative of a difference between variant 1 and 2 in terms of performance of rustc at runtime. There are several inputs which can be changed:
From each of these we have several metrics to evaluate the effect:
I think we don't want to alter the settings when perf.rlo is run, as that skews our data and generally doesn't make for something that's easy to compare. That means any changes in configuration in this PR should be gated on Rust's CI (somehow) rather than just any x.py run. Right now the table I think I'm struggling to read the table in a conclusive way - I don't know exactly what would help, but it seems like the primary problem with the current table is that it combines data along these axes in a way that at least for me is hard to compare. Maybe splitting apart into multiple tables or putting each metric in its own table would be helpful, I'm not sure. |
As far as I can see that is mostly a reflection to what degree optimizer decision made when building rustc affect the results, and how inaccurate benchmarks are as a consequence. After spending some time profiling I am entirely unsurprised given that: 50% of all instructions in inflate are in just one function, for keccak in 2 functions, for match-stress-enum in 2 functions, for ctfe-stress-4 around 10. Just like in match-stress-enum case it should be easy to isolate and direct optimizer towards making the same decisions it is making now, if one desired to do so. Given that reducing the number of CGUs for tooling takes around 10 min longer in try build ("Try 1" vs "Try 2"), that building with one or two CGUs takes at least 10 min longer ("Try 1" & "Try 2") than other options, and that builds with one, two, three, and four CGUs have regressions in perf results. What about reducing the number of CGUs for rustc to 5, which reduces the size of toolchain by 8 MB, took 54 min in try builder and gave following perf results? |
r? @ghost